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1  Introduction 

In  this  paper  we  propose  a  mathematical 
framework  for  unifying  and  generalizing  the  three 
principal  data  models,  i.e.,  the  relational,  hierar¬ 
chical  and  network  models  ([U]).  Until  recently 
most  work  on  database  theory  has  focussed  on 
the  relational  model  ([Cl]),  mainly  due  to  its  el¬ 
egance  and  mathematical  simplicity  compared  to 
the  other  models.  Some  of  this  work  has  pointed 
out  various  disadvantages  of  the  relational  model, 
among  them  its  lack  of  semantics  ([C2],  [HM], 
[SmSm])  and  the  fact  that  it  forces  the  data  to 
have  a  flat  structure  that  the  real  data  does  not 
always  have. 

Several  recent  papers  have  addressed  this 
problem  by  trying  to  find  a  more  general  math- 
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ematical  framework.  Specifically,  Jacobs  [J]  de¬ 
scribes  “database  logic,”  a  mathematical  model 
for  databases  that  claims  to  generalize  all  three 
principal  data  models.  Also,  Hull  and  Yap  [HY] 
describe  the  “format  model.”  In  their  model,  they 
view  database  schemes  as  trees,  where  each  leaf 
represents  data,  and  each  internal  node  represents 
some  connection  between  the  data. 

Both  these  models  are  unsatisfactory  in  their 
ability  to  restructure  data,  i.e.,  the  ability  to  query 
the  database.  While  Hull  and  Yap  ignore  the  issue 
of  a  data  manipulation  language,  Jacobs’  treat¬ 
ment  is  an  overkill — his  query  language  enables 
one  to  write  noncompu table  queries  [V]. 

Furthermore,  both  approaches  fail  to  model 
a  significant  aspect  of  hierarchical  and  network 
database  management  systems,  which  is  the  abil¬ 
ity  to  use  virtual  records.  Virtual  records  are 
essentially  pointers  to  physical  record*,  and  they 
arc  used  to  avoid  redundancy  in  the  database  [Uj. 
Note  that  virtual  records  introduce  cyclicity  not 
only  in  the  schema  level  but  also  at  the  instance 
level. 


In  the  model  we  propose  here  a  database 
scheme  is  an  arbitrary  directed  graph.  As  in  the 
format  model,  leaves  (i.e.,  nodes  with  no  outgoing 
edges)  represent  data,  and  internal  nodes  repre- 
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Figure  £  Format  Representation  of  Fig.  1. 


Example  1.  Assume  we  are  given  the  PER¬ 
SON-PARENT  relation  shown  in  Fig.  1.  We  can 
represent  the  structure  of  this  relation  in  the  for¬ 
mat  model  by  the  format  in  Fig.  2.  This  format 
has  two  nodes  «  and  v  of  type  □  that  correspond 
to  the  attributes  of  the  relation,  and  one  node  w 
of  type  Q  that  connects  the  pairs  of  related  at¬ 
tributes. 

An  instance  of  this  format  will  be  an  assign¬ 
ment  of  values  to  each  node.,  as  follows.  We  shall 
use  the  notation  I(u)  to  mean  the  set  of  values  as¬ 
signed  to  the  node  u  by  the  instance  I.  We  could 
just  take  as  the  instance  of  a  node  a  set  of  ele¬ 
ments  from  the  underlying  domain,  or  tuples  or 
sets  taken  from  the  instance  of  the  node’s  succes¬ 
sor.  If  we  were  to  use  this  approach,  we  would 
not  be  able  to  deal  with  cycles  in  the  format,  and 
even  if  the  format  were  acyclic,  we  would  lose  the 
ability  to  represent  pointers  to  other  nodes  in  our 
model,  since  the  data  would  be  represented  explic¬ 
itly  at  each  node.  What  we  do  instead  is  have  the 
instance  of  each  node  consist  of  a  set  of  1- values, 
with  corresponding  r- values. 

Intuitively,  r-v&lues  constitute  the  data  space, 
and  the  1-values  constitute  the  address  space.  The 
instance  of  a  node  consists  of  set  of  1-vnlucs,  with 
an  r-value  assigned  to  each  of  them.  Formally,  the 
lr values  are  elements  of  a  fixed  set  L  (usually  taken 
to  be  the  natural  numbers).  We  require  that  the 
instances  of  different  nodes  be  disjoint.  We  also 


have  a  function  r  on  L,  that  assigns  r- values  to 
these  1- values,  and  we  require  that  the  r-values  be 
of  the  correct  form,  depending  on  the  type  of  the 
node. 


Definition  2.  An  instance  of  a  schema  S  = 
consists  of  a  mapping  I  from  V  to  pI'n(L) 
( all  finite  subsets  of  L),  and  a  mapping  r  from 
U0evf (v);  r  maps  l-values  to  their  r-values.  If 
v  Vi,  then  I(v)  and  I(u>)  must  be  disjoint.  For 
each  node  v  in  G,  J(v)  must  satisfy 


(1)  If  p(v)  =  □,  then  for  each  l  £  I(v),  r(l)  must 
be  in  D. 


(4)  If  ft(v)  =  0  and  v\, . . ,  ,vn  are  the  successors 
of  v,  then  for  any  l  £  I(v),  r(!)  must  be  a 
tuple  (Ii, . .  such  that  for  each  i  between 

1  and  n,  !,•  is  an  element  of  I(vi).  An  l-value 
in  /(«<)  can  appear  in  any  number  of  tuples, 
including  none  of  them. 


(5)  If  p(v)  =  O  and  v  is  v’s  successor,  then  r(f) 
must  be  a  subset  of  I(v). 


If  l  is  an  l-value,  r (1)  is  called  the  r-value  ofl. 
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Figure  3  Instance  for  the  First  Example 
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sent  connections  between  the  data.  While  it  is 
not  hard  to  model  cyclicity  at  the  schema  level,  it 
is  not  quite  apparent  how  to  do  it  at  the  instance 
level  without  running  into  cyclic  definitions.  Our 
solution  is  to  keep  the  obvious  distinction  between 
memory  locations  and  their  content.  Thus,  in¬ 
stances  in  our  model  consists  of  r-values,  which 
constitutes  the  data  space,  and  1-vaiues,  which 
constitutes  the  address  space.  This  mechanism 
enables  us  to  give  semantics  to  instances  in  a  well- 
defined  way. 

A  data  model  consists  of  several  components 
(see  [TL]).  The  first  is  the  database  structure  men¬ 
tioned  above  which  describes  the  static  portion  of 
the  database.  The  second  component  is  a  way  to 
specify  integrity  constraints  on  the  database,  that 
restrict  the  allowed  instances  of  the  schema.  We 
shall  describe  a  logic  in  which  integrity  constraints 
can  be  specified.  Unlike  Jacobs’  logic,  our  logic  is 
effective.  That  is,  given  a  database  and  a  sentence 
in  the  logic,  one  can  test  effectively  whether  the 
sentence  is  true  in  the  database  or  not. 

The  third  component  will  be  a  way  to  restruc¬ 
ture  data,  in  order  to  describe  user  views,  query 
languages,  etc.  We  describe  two  such  mechanisms, 
a  logical,  i.e.,  non-procedural,  query  language  and 
an  algebra,  i.e.,  procedural,  query  language  that 
are  analogous  to  Codd’s  tuple  calculus  and  re¬ 
lational  algebra,  and  we  prove  them  equivalent. 
These  languages  have  a  novel  feature:  not  only 
can  they  access  a  non-flat  data  structure,  i.e.,  a 
hierarchy,  but  the  answers  they  produce  do  not 
have  to  be  flat  either.  Thus,  the  language  really 
docs  have  the  ability  to  restructure  data  and  not 
only  to  retrieve  it. 

2  The  Format  Model 

In  our  model,  a  schema  is  an  arbitrary  di¬ 
rected  graph,  with  a  type  associated  with  each 


node.  These  types  can  be  as  follows. 

(1)  Basic  type,  written  D.  Nodes  of  this  type 
contain  the  data  stored  in  the  database. 

(2)  Composition,  written  Q.  Nodes  of  this  type 
contain  tuples  whose  components  are  taken 
from  the  successors  of  the  node. 

(3)  Collection,  written  O.  Nodes  of  this  type 
contain  sets,  all  of  whose  elements  are  taken 
from  the  node’s  successor. 

Formally, 

Definition  1.  A  schema  is  o  directed  graph  G, 
together  with  a  J unction  p  that  assigns  a  type  to 
each  node  of  G.  p  is  a  function  from  V ,  the  set  of 
nodes  of  G,  to  the  set  {  □,  O,  Q  }.  p(v)  can  he  □ 
only  when  v  has  no  successors;  It  can  be  Q  only 
when  v  has  at  least  one  successor;  O  only  when  v 
has  exactly  one  successor. 

For  each  node  v  of  type  Q  we  have  an  order¬ 
ing  of  its  successors,  so  that  we  can  refer  uniquely 
to  “the  k1*1  successor”  of  v.  Note  that  we  do 
not  have  pointer  types  explicitly.  However  if  we 
wanted  them  in  the  model,  we  could  describe  them 
as  Q -nodes  with  exactly  one  successor. 


PERSON 

PARENT 

Rclioboam 

Solomon 

Solomon 

David 

Solomon 

Datsheba 

David 

Jesse 

Figure  1 


PERSON  PARENT  Relation. 


:'4m 


•J  £ 


:■  -  a  -i 


Figure  4  Hierarchy  for  the  Genealogy. 
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Figure  5  Instance  of  the  Format  in  Example  2. 

In  Fig.  3  we  show  an  instance  of  the  format 
in  Fig.  2  corresponding  to  the  data  in  Fig.  1. 

Example  2.  We  could  also  be  given  the  geneal¬ 
ogy  as  a  hierarchy,  as  shown  in  Fig.  4.  This  could 
be  represented  in  the  format  model  as  the  format 
in  Fig.  4,  with  the  instance  in  Fig.  5.  In  general, 
given  a  hierarchy,  we  can  convert  it  to  a  hierar¬ 
chy  aa  follows.  Let  Rlt  . . . ,  An  be  the  nodes  in 
the  hierarchy  (the  logical  record  types).  For  each 
Jti  we  have  a  corresponding  Q-node  in  the  for¬ 
mat.  Far  each  field  of  the  logical  record  type 
Vi  has  one  successor  of  type  □.  The  links  arc  rep¬ 
resented  as  follows.  If  L,’  is  a  link  in  the  hierarchy, 
with  owner  Rj  and  member  Rk,  we  have  in  the 
format  a  node  w {  of  type  O,  that  is  a  successor  of 
Rk,  and  whose  successor  is  Rj. 


Figure  6  Another  Representation  of  the  Genealogy. 
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Figure  7  Instance  of  the  Format  in  the  Figure  6. 

Example  S.  We  can  also  use  the  format  model 
to  represent  data  structured  in  ways  that  do  not 
correspond  to  any  of  the  standard  data  models. 
For  example,  we  could  represent  the  genealogy  by 
the  format  in  Fig.  6,  and  the  instance  in  Fig.  7. 

3  Logic 

We  define  a  calculus  on  formats  in  two  stages. 
In  this  section  we  define  a  logic  on  formats.  Then 
in  the  next  section,  we  use  this  logic  to  define 
queries  on  formats.  These  queries  will  corre¬ 
spond  to  tuple  calculus  expressions  in  the  rela¬ 
tional  model.  We  can  also  use  the  calculus  to  de¬ 
scribe  integrity  constraints  in  the  database. 

Each  variable  in  our  logic  has  a  fixed  sort, 
where  the  sorts  are  nodes  in  the  graph.  The  sorts 
restrict  the  possible  values  the  variable  may  take. 
For  example,  if  x  is  a  variable  of  sort  v,  x  can 


take  only  values  in  /(v).  In  future,  wc  will  usu¬ 
ally  subscript  the  variable  with  its  sort,  e.g., 
Though  the  values  of  variables  are  always  1-values, 
we  shall  say  “the  1-value  of  x„”  when  wc  mean  the 
value  of  x„,  and  “the  r- value  of  x,”  when  we  mean 
the  r-value  of  the  value  of  x». 

Definition  3.  An  atomic  formula  is  one  of 

(1)  xv  *■(  yw,  meaning  that  the  l-value  of  x„  is  the 

component  of  the  r-value  of  yw. 

(2)  x,  €  yw,  meaning  that  the  l-value  of  x„  it  a 
member  of  the  r-value  of  yw. 

($)  xv  =i  yw,  meaning  that  the  l-values  of  x„  and 
yw  are  equal 

(4)  xv  yw,  meaning  that  the  r-valuet  of  xv  and 
yw  are  equal. 

(5)  Xp—T  d,  where  d  it  an  element  of  the  domain 
D,  meaning  that  the  r-value  of  xv  is  d. 

We  then  define  well  formed  formulas  in  the 
usual  way.  N/  <j>(lx , . . . ,  ln)  will  mean  that  4>  is  sat¬ 
isfied  by  1 1 , . . . ,  ln  in  the  instance  I.  This  is  defined 
as  follows. 

Definition  4.  Let  ^(x^ , . . . ,  x^)  be  a  formula 
with  free  variablet  xjt , . . . , x"n .  Let  It, 
be  l-valuet,  where  for  each  i,  1,-  €  /(««■).  Then 
^/4(fl><  •  *  ,  in)  is  defined  by  induction  on  the  rite 
of  4,  at  Jollowt. 

(la)  IJd  m  VL  then  f=(x‘.  w,  . . .  ,1„) 

iff  w  it  of  type  Q  with  at  least  t  successors, 
and  li  =  II|(ij). 

(i6)  If  4>  it  x‘v  €  *{,,  then  )=(x*  €  *i)(/i,..,/„) 
i^io  is  of  type  O  andli  it  an  element  of  r(l}). 
(Jc)  If  it  x\  =i  a£,  then  )=(xj  =i  x£)(l,,...,ln) 
iffli  =  lj- 

(Id)  If4>it  *£  =,  *£,,  then  N(x£  =r  *4)(*1.  •  •  •  »*») 
iff  *(U)  =  r(h). 

(ie)  If  <j>  it  x\  =T  d,  where  d  it  an  element  of  D, 
then  Hx<  =,  d)(l iff  r(l<)  =  rf. 


(£)  If  d  i*  4>i  V  or  ->^i,  then  the  definition  it 
the  usual  one. 

(5)  If  $  is  a  formula  with  free  variables  xjt ,  , 

xj^,  and  yw,  then 

N  ((3»w)0)(fi,---,ln) 
iff  there  is  an  l  in  I(w)  such  that 
l=^(/i,...,  !„,/). 

Example  4.  The  formula  x„  ttj  y„  says  that 
the  l-value  of  xu  is  equal  to  the  first  component 
of  the  r-value  of  y„.  It  is  satisfied  in  the  instance 
of  Example  3  by  the  (xu,y„)  pairs  (1,7),  (2,8), 
(3,9),  (4, 10),  and  (5,11). 

Lemma  1.  If  4(xplt...,x"n)  is  a  formula  with 
free  variables  xj  ,...,xJB,  and  ii,...,in  are 
l-values,  where  for  each  i,  <=  /(»,•),  then 
^(11, . . .  ,1„)  can  6e  determined  effectively. 

Proof:  This  is  shown  by  induction  on  the  size  of 
the  formula.  For  atomic  formulas  testing  for  satis¬ 
faction  is  straightforward.  Testing  for  disjunction 
and  negation  is  also  clearly  effective,  and  the  re¬ 
sult  for  quantification  follows  from  the  fact  that 
all  instances  are  finite.  | 

4  Queries 

In  the  relational  model  the  result  of  a  query  is 
a  relation.  We  might  therefore  in  analogy  expect 
that  the  result  of  a  query  on  the  format  model  will 
be  a  format,  i.e.,  a  schema  and  an  instance  of  it. 
This  approach  generalizes  the  relational  algebra 
approach  and  may  also  suggest  query  languages 
.  for  the  other  data  models. 

We  modify  this  approach  a  bit  by  not  requir¬ 
ing  that  the  query’s  schema  be  an  independent 
schema,  but  wc  allow  the  successors  of  nodes  in 
the  query  to  be  nodes  in  the  database.  One  reason 
for  this  is  that  we  may  want  our  answer  to  have 


references  to  the  database  rather  than  copies  of 
large  structures.  Another  reason  is  to  simplify  the 
definitions  of  the  algebraic  operations.  We  shall 
want  each  algebraic  operation  to  be  the  result  of 
some  query,  but  on  the  other  hand  we  would  like 
to  be  able  to  simulate  an  arbitrary  “safe*1  query  by 
a  sequence  of  algebraic  operations,  and  if  each  op¬ 
eration  had  to  create  a  completely  new  format,  the 
definitions  would  be  unnecessarily  complicated. 
Notice  that  if  the  query  were  a  “normal  query,” 
i.e.,  one  which  created  a  completely  new  structure, 
the  corresponding  algebraic  operations  would  first 
copy  the  required  nodes,  and  then  would  combine 
1- values  only  from  these  new  nodes. 

The  natural  thing  to  do  now,  using  the  anal¬ 
ogy  with  the  tuple  calculus  in  the  relational  model, 
is  to  take  some  formula  4>  and  let  the  instance  be 
those  things  satisfying  it. 

There  are  two  problems  with  this  approach. 
The  first  is  that  the  queries  cannot  build  the  1- 
values  by  themselves — such  a  formula  just  says 
when  a  given  instance  satisfies  it,  but  gives  no 
way  to  construct  such  an  instance.  One  solution 
is  to  prevent  the  query  from  referring  directly  to 
1- values,  and  allow  them  to  be  referred  to  only  by 
their  r- values.  We  could  then  find  all  possible  r- 
values  than  could  appear  in  the  result,  assign  them 
arbitrary  1-values,  and  try  to  show  that  the  result 
is  unique  up  to  isomorphism. 
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Another  problem  with  this  approach  is  deal¬ 
ing  with  cyclicity.  We  need  the  ability  to  refer 
directly  to  1-values  in  order  to  make  use  of  the 
cyclicity,  but  even  then  the  result  of  the  query 
will  not  always  be  defined  uniquely.  For  example, 
if  the  query  schema  is  the  format  in  Fig.  8,  and 
the  query  just  specifics  that  /(u)  and  I(v)  each 
contain  at  least  two  different  1-values.  Then  Fig.  9 
and  10  shows  two  incomparable  instances,  and  we 
have  no  way  to  choose  between  them.  Our  solu¬ 
tion  has  been  to  restrict  the  queries  to  ones  not 
containing  cycles,  while  allowing  cyclicity  in  the 
database,  and  allowing  the  query  to  refer  explic¬ 
itly  to  1-valnes  in  the  database. 

The  formal  definition  of  a  query  is 

Definition  5.  Given  a  database  schema  S  — 
(G,p),  and  an  instance  I,  a  query  Q  =  {S' ,*) 
on  the  database  consists  of 
(1)  A  set  of  nodes  and  directed  edges  with  types 
associated  with  each  node,  S'  =  such 

that 

(a)  G'  is  a  DAG,  and  edges  can  also  connect 
nodes  of  G'  to  nodes  of  G. 

(b)  (G  U  G',  U  n')  is  a  schema,  which  we  shall 
callS'  =  (G',i*'). 


Figure  8 


ft)  The  fact  that  G'  it  a  DA  G  implies  that  there 
is  an  order  •<  on  the  nodes  of  G',  such  that  if 
v  and  w  are  nodes  of  G'  and  v  is  a  successor 
of  tv,  then  v  -<  tv.  Let  •<  be  a  fixed  ordering 
of  the  nodes  with  this  property. 

(3)  ♦  is  a  set  of  formulae,  one  for  each  node  v  of 
G' .  The  formula  <f>v  corresponding  to  v  must 
satisfy 

(a)  There  is  only  one  free  variable  in  4>e,  and  it 
is  of  sort  v. 

(b)  All  other  variables  are  bound,  and  their  sorts 
are  either  nodes  of  the  database,  or  are  nodes 
that  precede  v  under  -<. 

Since  the  query  is  now  acyclic,  we  can  create 
the  result  of  the  query  “bottom-up,”  i.e.,  we  define 
the  result  of  the  query  at  each  node  in  terms  of  the 
results  at  its  successors.  We  define  the  result  of 
the  query  by  the  following  inductive  construction. 
Assume  that  7(tv)  has  been  defined  for  all  nodes 
w  in  the  query  that  precede  v  under  We  then 
say  that  r  is  a  candidate  r-value  for  v  if  by  setting 
r(i)  =  r,  and  letting  /(v)  contain  the  single  1- value 
l,  we  get  h/  4>„{l)  (1  is  an  arbitrary  unused  1- value). 
The  construction  of  7(v)  is  as  fallows.  Let  R  be 
the  set  of  all  candidate  r-values  for  v.  For  each 
r£  fi  arbitrarily  select  a  different  unused  Lvalue 
lr,  set  r(!r)  =  r,  and  let  /(v)  contain  of  all  these 
Lvalues.  Repeat  this  for  each  node  v  in  the  order 
given  by  ■<. 

We  now  give  the  formal  definition  of  the  re¬ 
sult  of  a  query.  We  start  with  the  definition  of 
candidate  r-value. 

Definition  0.  Let  be  a  formula  with  free 

variable  z„  and  let  I  be  an  instance  of  some  of  the 
nodes  in  the  format,  including  the  sorts  of  all  the 
bound  variables  in  4V.  Let  l  be  an  Lvalue  that  is 
unused  in  I.  We  say  that  r  is  a  candidate  r-value 
far  v  if  by  setting  r(l)  —  r,  we  get  ^/u{i)  ^«(0-  Let 
R  be  the  set  of  all  candidate  r-values  for  v.  For 


each  r  €  R,  select  a  different  unused  l-value  lr  ( the 
choice  of  l-values  is  arbitrary),  and  set  r(lr )  =  r. 
Then  [1  |  ^„(i)]  is  defined  to  be  {lT  \  r  e  R}. 


Lemma  2.  In  Definition  6,  assume  that  R  is 
finite.  Then  if  7(v)  is  defined  to  be  [1  |  the 

following  hold. 

(i)  For  each  l  in  I(v),  1 4>v(l )• 

(it)  If  we  take  different  unused  l-values  in  the  def¬ 
inition,  we  get  an  isomorphic  instance  I'. 

(iii)  There  are  no  two  different  l-values  in  I(v), 
li  /l  h  with  r(i|)  =  r(ij). 

(iv)  7(v)  is  maximal  satisfying  (i)-(itt).  | 

Definition  7.  The  result  of  the  query  is  an 
instance  I *  of  the  schema  S* .  It  is  defined  by 
induction  as  follows.  If  v  is  a  node  of  the  database, 
we  define  7*(v)  =  7(v).  If  v  is  a  node  of  the  query, 
assume  that  we  have  already  defined  7*  (tv)  for  any 
node  w  that  precedes  v  under  -<.  Then  7*(v)  is 
defined  as  [<  |  <£„(/)]. 


Figure  11  Query  on  the  Genealogy  Database. 
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Figure  12  Possible  Result  of  the  Query. 

Example  5.  Assume  that  the  database  is  the 
genealogy  format  of  Fig.  6  with  the  instance  of 
Fig.  7.  The  query  will  consist  of  the  node  u'  in 
Fig.  11,  with  formula  <t>u'(xu>)  =  (3yu)(zu.  =r  yu). 
In  other  words,  we  want  /(it')  to  be  a  copy  of 
I(u)  (removing  duplicated  values,  if  /( u)  had  any). 
To  answer  the  query  we  do  the  following.  First, 
take  an  unused  1-value,  say  17.  Now  look  for  all 
possible  r-values  r  (in  this  case,  elements  of  D), 
such  that  if  we  set  r(17)  =  r,  and  /(«')  =  {17}, 
we  get  (17).  The  set  R  of  candidate  r-values 
is  R  ~  {Rehoboam,  Solomon,  David,  Batsheba, 
Jesse},  and  therefore  the  result  of  the  query  (up 
to  isomorphism)  is  as  shown  in  Fig.  12. 

Definition  8.  A  query  Q  on  a  database  with 
schema  S  it  safe  if  for  every  instance  I  of  S  the 
result  of  the  query  exists. 

The  following  lemma  shows  that  to  check  if  a 
query  is  safe,  it  suffices  to  check  the  results  at  the 
leaves. 

Lemma  3.  A  query  Q  on  database  schema  S  is 
safe  iff  for  every  instance  I  of  the  database,,  and 
for  every  node  v  of  the  query  of  type  □,  the  set  of 
candidate  r-values  for  v  is  finite.  | 

Lemma  4.  Let  Q  be  a  query  on  a  database  with 
schema  S,  and  let  u^, . . .  ,u>„  be  the  nodes  in  the 
database  of  type  □.  Q  is  safe  iff  if  there  is  a  fi¬ 
nite  set  (dj, . .  .  ,dj. )  oj  elements  of  D  such  that 


for  every  instance  I  of  the  database  S,  and  for  ev¬ 
ery  node  v  of  type  O  in  the  query  Q,  all  of  the 
candidate  r-values  for  f(v)  are  either  r-values  of 
elements  of  the  I(w<)  ’s  or  are  among  the  dj  | 

The  constants  in  the  above  lemma  are  those 
elements  of  D  that  arc  mentioned  in  any  of  the 
formulas  <b„. 

5  Algebra 

We  now  define  an  algebraic  query  language. 
This  language  is  equivalent  to  the  logical  query 
language.  That  is,  each  logical  query  is  equivalent 
to  a  logical  query  and  vice  versa. 

The  algebraic  language  consists  of  the  follow¬ 
ing  basic  operations: 

(1)  to  <—  o  creates  a  new  node  to  of  the  same  type 
as  v,  and  with  the  same  successors  as  v.  /(w) 
contains  a  copy  each  1-value  in  I(v). 

(2)  tv  «—  D(d)  creates  a  node  to  of  type  □,  that 
contains  a  single  l-value,  whose  r-value  is  d. 

Example  6.  Let  the  database  be  the  genealogy 
of  Fig.  6  with  the  instance  of  Fig.  7.  The  opera¬ 
tions  u'  *—  □(«)  and  v'  <—  (“Absalom”)  each  add 
a  new  node  u'  and  o'  respectively  to  the  database. 
Their  instances  are  shown  in  Fig.  13  (a)  and  (b). 
/(«')  /(o') 


l 

r(0 

l  r(l) 

17 

Rehoboam 

17  Absalom 

18 

Solomon 

19 

David 

20 

Batsheba 

21 

Jesse 

Figure  IS  Examples  of  (a)  u'  «—  u  (b)  o' «—  O(d). 

(3)  to  «—  O(o)  creates  a  new  node  to  of  type  O 
with  successor  o.  /(to)  contains  a  copy  of  each 
possible  subset  of  /(v). 


V 


(4)  w  *—  Q(i>i,...,t>„)  creates  a  new  node  to  of 
type  Q,  with  successors  wj , . . . ,  v„.  /(in)  con¬ 
tains  all  possible  tuples  with  itk  component  in 
/(»,). 

(5)  If  v  is  a  node  of  type  Q  with  n  successors,  and 
»  0  j  is  one  of  the  relations »  €  j,  i  II«  j,  i  =j  j, 
i  =r  j  and  »  =,  d,  then  w  «—  <x<  a  j(v)  creates 
a  node  w  of  the  same  type  as  v  and  with  the 
same  successors,  that  contains  a  copy  of  each 
tuple  from  /(v)  whose  itk  and  jth  components 
are  in  the  specified  relation. 

(6)  If  wi, . . . ,  i>„  are  all  of  the  same  type  and  have 
the  same  successors,  then  w  *-  U(t>i, . . .  ,«n) 


u'O 


Figure  14  Example  of  IT. 
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Figure  15  Example  of  IT(tt'). 

Example  7.  Suppose  the  database  is  the  geneal¬ 
ogy  format  with  the  extra  node  u'  shown  in  Fig.  14 
and  with  the  instances  in  Fig.  7  (u,  v  and  w)  and 
Fig.  15  (u').  Then  x  «—  IT(u')  creates  the  node  x 


creates  another  similar  node  ui  that  contains 
a  copy  of  each  element  that  is  in  one  of  the 
/(«,)’»■ 

(7)  If  vt  and  vj  have  the  same  type  and  the  same 
successors,  then  w  *—  »i  —  vj  creates  another 
similar  node  to  that  contains  a  copy  of  each 
element  of  /(«i)  whose  r-value  is  not  the  r- 
value  of  any  clement  of  /(wj). 

(8)  If  v  is  of  type  Q  with  successors  0|,...,vn, 
and  S  ={«i,..., }  is  a  subset  of  the  set 
{l,...,n},  then  it  «—  IIs(v)  creates  a  new 
node  w  of  type  Q  with  successors  vtl, . . .  ,v,k, 
that  contains  all  projections  of  tuples  in  f(v) 
onto  these  components. 

(0)  If  v  is  of  type  Q  and  has  exactly  one  succes¬ 
sor,  v,  then  w  *—  IT(u)  creates  a  new  node  w 
similar  to  v,  that  conttuns  a  copy  of  each  el¬ 
ement  of  I(v)  that  is  the  component  of  some 
element  of  /(v). 


in  Fig.  14,  with  I(x)  as  in  Fig.  15. 

Main  Theorem.  The  algebraic  language  and  the 


logical  language  are  equivalent,  j.e.,  every  alge¬ 
braic  query  is  equivalent  to  a  safe  logical  query, 
and  every  safe  logical  query  is  equivalent  to  an 
algebraic  query. 

Outline  of  Proof.  The  first  direction  of  the  the¬ 
orem,  that  for  each  algebraic  operation  there  is  an 
equivalent  query,  is  shown  by  creating,  for  each 
operation,  a  query  that  consists  of  one  new  node 
w  with  formula  <t>w-  This  query  will  be  safe  and 
will  have  the  same  result  as  the  corresponding  al¬ 
gebraic  operation.  The  details  are  fairly  straight¬ 
forward  and  will  not  be  given  here. 

For  the  second  part  of  the  theorem,  we  are 
given  a  safe  query,  and  we  have  to  show  that  it 
can  be  simulated  by  a  sequence  of  algebraic  oper¬ 
ations.  Let  Q  be  the  query.  We  shall  construct 
the  desired  algebraic  expression  by  induction  on 


the  nodes  in  the  query  using  the  order  -<.  Assume 
therefore  that  we  have  an  algebraic  expression  for 
each  node  that  precedes  a  node  w  in  the  query,  and 
we  now  want  to  find  an  algebraic  expression  that 
constructs  the  node  iv.  The  formula  correspond¬ 
ing  to  vi  is  ^w(xv).  Let  the  bound  variables  of 
be  xjt , . . . ,  (where  each  variable  is  bound 
by  exactly  one  quantifier).  Since  the  query  is  safe, 
there  is  a  set  (dj, . . .  ,<!*}  of  elements  of  D,  such 
that  each  r-value  of  a  node  of  type  □  is  either  one 
of  these  constants,  or  is  an  r-value  of  some  node 
in  the  database  of  type  □. 

Our  first  step  is  to  create  a  node  tS  that  rep¬ 
resents  the  domain  of  w,  i.e.,  all  the  r-values  that 
elements  of  /(to)  could  possibly  have,  if  <pu  con¬ 
tained  no  restrictions  apart  from  the  safeness  re¬ 
quirement  given  above.  We  define  to  as  follows. 

(1)  If  w  is  of  type  □,  and  vi,  . . . ,  un  are  all  the 
nodes  in  the  database  of  type  □,  then  define 
w  as  follows,  (i)  Let  to,-  «—  v,-  for  1  <  «  <  » 
(ii‘  Let  «o„+l  «—  □(<£,-)  for  1  <  »  <  fc,  where 
the  dj’s  are  the  constants  listed  above,  (iii) 
Let  w  t»i  U  •  •  •  U  tem+*. 

(2)  If  w  is  of  type  Q  and  its  successors  are 
u»|,. . .  ,Wk,  let  to  ♦-  t»i  x  •••  x  to*. 

(3)  If  to  is  of  type  O  and  its  successor  is  u,  let 
to «-  O(u). 

We  can  then  show: 

Lemma  5.  Let  f(w)  be  the  remit  of  the  given 
query  at  node  to.  Then  every  l  in  I(w)  must  satisfy 
r(/)er[/(«i)].  | 

Let  the  bound  variables  in  be  xj4,  ..., 
Let 

u  «-  x  •  •  •  x  »B  x  to, 

and  "label”  each  with  the  variable  x(,4.  This 
enables  us  to  distinguish  between  two  copies  of 
the  same  node  that  came  from  different  variables. 
Also  label  to  with  zw.  We  define  nodes  vg,  for  each 


well-formed  subformula  ip  of  <pw,  by  induction  on 

the  size  of  ip,  as  follows. 

(1)  If  ip  is  an  atomic  formula  of  the  form  x'v.  0  xj  , 
let  vg  *—  an  g  j-(u).  If  ip  is  of  the  form  x*.  0  x«,, 
let  vg  *—  Oig  n+j(u),  similarly  for  the  cases 
where  ip  vs  of  the  form  xw  0  xj,.  and  xw  0  xw. 
In  each  case  ip  has  the  same  successors  as  u. 

(2)  If  ip  is  ipi  V  ipt,  vg,  and  vg3  may  have  dif¬ 
ferent  successors.  We  shall  show  below  that 
to  is  always  a  successor  of  each  Vg.  Let  the 
common  successors  of  v^,I  and  tiy,,  be  labeled 
*2.  »•••»*£*  411,1  *»•  Let  4,  «“  Hsjw*,) 
and  *—  nSj(o0a),  where  Si  and  Sj  are 
the  numbers  of  the  components  of  vg,  and 
vgt  corresponding  to  the  common  successors 
of  vgt  and  vg^.  Then  let  vg  «—  vL  U  v'gt. 

(3)  If  ip  is  -i tp\  let  x JJ  i , .  ■  ■ ,  xj*k  and  x„  be  the 
labels  of  the  successors  of  v^< .  Let 

«- }(u), 

and  vg  «—  ug>  —  t/^» . 

(4)  If  ip  is  (3x*0j)(V>')  and  x*^  ,. ..  ,  x„  are 

the  labels  of  the  successors  of  vg- ,  assume  that 
Vi  is  the  jith  component  of  .  Let 

v*  —  n{i . *+!>-{» 

It  can  then  be  shown  that: 

Lemma  6.  For  each  subformula  ip  of  <p  = 

( I )  v g  is  of  type  Q. 

(£)  The  successors  of  vg  are  w  and  the  sorts  of 
the  all  the  bound  variables  that  appear  in  vg, 
except  for  those  that  are  bound  by  a  quantifier 
in  ip. 

(5)  Assume  that  the  successors  of  vg  are  labelled 
*!!, ,  •  •  •  ’  andxw.  Then 

/(•*)»[*  I  r(l)  =  (f|,...,l*+t) 
A^ip(li,-..,lk+i) 

A(li, . . . , lk+i)  €  r(n{„ +1 } (f (**))])> 
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